Educational attainment and intergenerational mobility: A polygenic score analysis

Aldo Rustichini, William G. Iacono, James J. Lee, and Matt McGue| Journal of Political Economy 2023 131:10, 2724-2779

伊藤成朗

Summary of this paper

  • US Minnesota Twin Study Sample
  • Build a genetic model of human capital investments with genotype transmission (with better genetics foundation than ad hoc standard econ models)
  • Intergenerational elaciticity of income is predicted to be larger in genetic models
  • Empirical pathway: Through parental education and incomes, parental PGS has no direct effect on child human capital
  • Child PGS is relevant
  • Genetic nurture (introverted genotype produces an environment to grow introverted child, independently of income and education) is not found

Background

SNP, polygenic score

.
└── Chromosome
    └── DNA
        └── Gene
            └── Genome
  • Chromosomes: 1/2 each from both parents
  • DNA: Rolled up and shapes like an X
  • Gene: DNA segments
  • Genome: 3 billion nucletiodes (letters)
    • Letter sequence = genome sequence
    • 1.5 billion base pairs (alelles, 対立遺伝子)
  • Base pairs differ between people in less than 1% (15 million) locations
  • Single-nucleotide polymorphism (SNP) = such location
    • 2 chromosomes per person → AT-AT, GC-GC, AT-GC or 0, 2, 1 (reference = GC)

Twin studies

  • Estimate how much genetic factors collectively matter for explaining variations of a trait
    • Collectively: Unit = group level aggregation
    • Explaining variations: Variance decomposition
  • Do not reveal which SNP is correlated

Genome wide association studies (GWASs)

  • Collect \(M\) observable (?) SNPs of (family \(i\)) individual \(j\).

Regress outcome \(y^{i}_{j}\) on \(m\)-th SNP for all \(m=1, \dots, M\):

\[ y^{i}_{j} = \beta_{m}SNP^{i}_{jm}+\epsilon^{i}_{jm}, \quad m=1,\dots,M. \]

Remember:

  • Environment is endogenous to the trait in SNP regressions
    • \(\cov[SNP^{i}_{jm}, \epsilon^{i}_{jm}]> 0\): Parents see traits which may be correlated with \(SNP^{i}_{jm}\) and decide on investments
    • Estimated \(\beta_{m}\) should be upwardly biased
  • No attempts to correct for this

Polygenic score = Predicted outcomes based on all relevant SNPs

\[ PGS^{i}_{j}=\sum_{m=1}^{M}\tilde{\beta}_{m} SNP^{i}_{jm} \]

  • Use Bayesian LDpred procedure to correct for correlations in \(\tilde{\beta}_{m}\)
    • Barth, Papageorge, and Thom (2020) follow the convention and use all SNPs: Better out-of-sample results than using only SNPs with genome-wide significance \(p\) value \(< 5*10^{-8} =\) .0000005%
  • PGS is considered to be a predictor of individual fixed effects
    • Edu attainment SNPs \(\SIM\) biological process of brain development, cognition (Okbay et al. 2016; Lee et al. 2018)
    • Edu attainment SNPs \(\SIM\) cognition SNPs (Okbay et al. 2016)
    • HRS sample: cov[EA score, schooling years] / variance[schooling years] = 10.6%

Intergenerational skill formation

  • Becker and Tomes (1979): AR(1)
  • Goldberger (1989): Weighted average of all ancestors
  • Both are incorrect, this paper shows there exists an invariant (stable) mapping father genotype distribution × mother genotype distribution \(\to\) child genotype distribution, so skill formation depends on parental genotype distributions
flowchart LR
  A1[father] --> A3(couple)
  A2[mother] --> A3(couple)
  A3[couple] --> B("child
   genotype")
  A3[couple] --"genetic nurture"--> C("child
   environment")
  B("child
   genotype") --> D("child
    education")
  C("child
   environment") <--"gene×environ"--> B("child
    genotype")
  C("child
   environment") --> D("child
    education")
Figure 1: How parents affect child education outcomes

Model

Family \(i\) parents’ problem with twins \(j=1, 2\) of observable skills \(\theta^{i}_{j}\) \[ \begin{aligned} \max_{\{E^{i}, I^{i}_{1}, I^{i}_{2}\}} \;\;\;&\E\left[(1-\delta)\ln E^{i}+\delta\left(y^{i}_{1}+y^{i}_{2}\right)\left|\theta^{i}_{1}, \theta^{i}_{2}\right.\right]\\ \st \;\;\; & E^{i}+I^{i}_{1}+I^{i}_{2}=e^{y^{i}} \end{aligned} \tag{6} \] \[ \begin{align} \scriptsize{\mbox{Human capital production function}} && h^{i}_{j}&=\alpha_{I}\ln I^{i}_{j}+\alpha_{\theta}\theta^{i}_{j}+\epsilon^{\mathrm{h}i}_{j} && \phantom{mmmmmmmmmmmm} \tag{8}\label{eq8}\\ \scriptsize{\mbox{Child income}} && y^{i}_{j}&=\alpha_{\mathrm{h}}h^{i}_{j}+\epsilon^{\mathrm{y}i}_{j} &&\tag{9}\label{eq9} \end{align} \] Solution \[ I^{i}=\frac{\delta\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}}{1-\delta+2\delta\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}}e^{y^{i}} \tag{11} \] Optimised child income \[ y^{i}_{j}=\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}\ln \frac{\delta\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}}{1-\delta+2\delta\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}} +\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}y^{i}+\alpha_{\theta}\alpha_{\mathrm{h}}\theta^{i}_{j}+\alpha_{\mathrm{h}}\epsilon^{\mathrm{h}i}_{j}+\epsilon^{\mathrm{y}i}_{j} \tag{12}\label{eq12} \]

Matching of parents

  • Assume independence of genotype conditional on observable characteristics [incomes, skills \(w(g)\)]
    • Matches are made on observables, not on genotype
    • Matches are random within obserbable strata “worth” classes=\(w_{\mathrm{y}}y+w(g)\)]
      • Genotype matching is random within a stratum
    • Matches are nonrandom at the population due to population stratification based on observables

Genetic transmission: \(\Delta\)(father genotype) × \(\Delta\)(mother genotype) \(\to\) \(\Delta\)(child genotype)

The appendix D of paper shows an invariant mapping exists with properties shown in Theorem 3.1

Skills production function (genotype enters here, hence the parental genotypes) \[ \theta^{i}_{j}=w\left(g^{i}_{j}\right)+\pi y^{i}+\bfPi\bfx^{i}_{j}+F^{i}+\epsilon^{\theta i}_{j} \tag{4}\label{eq4} \] Estimating equations: Substitute [4] into [12] gives child income decomposition, and [11] into [8] gives human capital decomposition \[ \begin{align} y^{i}_{j} &= \alpha_{\theta}\alpha_{\mathrm{h}}w\left(g^{i}_{j}\right) +\left(\alpha_{\mathrm{I}}\alpha_{\mathrm{h}}+\alpha_{\theta}\alpha_{\mathrm{h}}\pi\right)y^{i} +\alpha_{\theta}\alpha_{\mathrm{h}}F^{i}\\ &\hspace{1em} +\alpha_{\theta}\alpha_{\mathrm{h}}\bfPi\bfx^{i}_{j} +\alpha_{\theta}\alpha_{\mathrm{h}}\epsilon^{\theta i}_{j} +\alpha_{\mathrm{h}}\epsilon^{\mathrm{h} i}_{j} +\epsilon^{\mathrm{y}i}_{j}\tag{21}\label{eq21}\\ h^{i}_{j} &= \alpha_{\theta}w\left(g^{i}_{j}\right) +\left(\alpha_{\mathrm{I}}+\alpha_{\theta}\pi\right)y^{i} +\alpha_{\theta}F^{i}\\ &\hspace{1em} +\alpha_{\theta}\bfPi\bfx^{i}_{j} +\alpha_{\theta}\epsilon^{\theta i}_{j} +\epsilon^{\mathrm{y}i}_{j}\tag{23}\label{eq23} \end{align} \]

Can show IGE is smaller in “standard model” than genetic model

Becker and Tomes (1979) \[ \begin{align} \theta_{t+1} &= \eta \theta_{t}+\epsilon^{\theta}_{t+1}\tag{26}\label{eq26}\\ y^{i}_{t+1} &= \alpha_{\mathrm{I}}\alpha_{\mathrm{h}}y^{i}_{t}+\alpha_{\theta}\alpha_{\mathrm{h}}\theta_{t+1}+\epsilon^{\mathrm{y}i}_{t+1}\tag{25}\label{eq25} \end{align} \] Genetic model of this paper assumes \[ \begin{align} \theta &= w^{\theta}_{\mathrm{f}}\theta_{\mathrm{f}}+w^{\theta}_{\mathrm{m}}\theta_{\mathrm{m}}\tag{A4}\label{eqA4}\\ y^{i} &= w^{\mathrm{y}}_{\mathrm{f}}y^{i}_{\mathrm{f}}+w^{\mathrm{y}}_{\mathrm{m}}y^{i}_{\mathrm{m}}\tag{A3}\label{eqA3} \end{align} \] Plugging these give \[ \begin{align} \theta_{t+1} &= \eta \sum_{s=m,f}w^{\theta}_{s}\theta_{st}+\epsilon^{\theta}_{t+1}\tag{33}\label{eq33}\\ y^{i}_{t+1} &= \alpha_{\mathrm{I}}\alpha_{\mathrm{h}}\sum_{s=m,f}w^{\mathrm{y}}_{s}y^{i}_{st}+\alpha_{\theta}\alpha_{\mathrm{h}}\theta_{t+1}+\epsilon^{\mathrm{y}i}_{t+1}\tag{32}\label{eq32} \end{align} \]

Estimation strategy

Structural equation model (SEM)

  • Structural equations
  • Measurement (error) equations

Identification conditions of structural parameters remain the same as in simultaneous equation models

In this paper:

  • identification for only recursive model is discussed (Tab 2)
  • use of IVs is not discussed (Tab 3)

Controls used in estimation

  • 10 principal components of PGS
  • Parent-child age difference

Identification ideas

  1. Dizygotic twins [Table 1, 2]
    • Share birth days, environment
    • Differ in genotypes
    • Ideal to see impacts of genotypes on outcomes

Identification ideas

  1. SEM
    • Recursive system (37)-(42) are identified as usual [Table 2]
    • Structural parameters of endogenous variables are recovered from [eq43]-[eq45] (how?) [Table 3]
      • Given PGS of child, PGS of parents should be redundant, so long as they do not affect environment (gene-environment correlation, or “passive rGE”)

\[ \begin{align} e^{i}_{\mathrm{h}} &= \alpha_{e_{\mathrm{h}}}+ \gamma^{\mathrm{pGS}_{m}}_{e_{\mathrm{h}}}\mathrm{pGS}^{i}_{m}+\gamma^{\mathrm{pGS}_{f}}_{e_{\mathrm{h}}}\mathrm{pGS}^{i}_{f}+\xi_{e_{\mathrm{h}}}\tag{43}\label{eq43}\\ y^{i}_{\mathrm{h}} &= \alpha_{y_{\mathrm{h}}}+\gamma^{\mathrm{pGS}_{m}}_{y_{\mathrm{h}}}\mathrm{pGS}^{i}_{m}+\gamma^{\mathrm{pGS}_{f}}_{y_{\mathrm{h}}}\mathrm{pGS}^{i}_{f}+\xi_{y_{\mathrm{h}}}\tag{44}\label{eq44}\\ e^{i}_{j} &= \alpha_{e}+\gamma^{e_{\mathrm{h}}}_{\mathrm{e}}e_{\mathrm{h}}^{i} +\gamma^{y_{\mathrm{h}}}_{\mathrm{e}}y_{\mathrm{h}}^{i} +\gamma^{\mathrm{pGS}}_{\mathrm{e}}\mathrm{pGS}^{i}_{j} +\gamma^{\mathrm{pGS}_{m}}_{\mathrm{e}}\mathrm{pGS}^{i}_{m} +\gamma^{\mathrm{pGS}_{f}}_{\mathrm{e}}\mathrm{pGS}^{i}_{f} +\xi_{\mathrm{e}}\tag{45}\label{eq45} \end{align} \]

Data: Minnesota Twin Study

  • Cognitive measures to estimate IQ
    • Wechsler Intelligence Scale for Children-Revised (WISC-R)
    • Wechsler Adult Intelligence Scale-Revised (WAIS-R)
  • Noncognitive measures
    • MPQ (Mulitdimensional Personality Questionnaire)
      • PA (positive emotionality of affectivity; well-being, social potency, achievement, social closeness)
      • NA (negative emotionality of affectivity; stress reaction, alienation, aggression)
      • constraint (control, harm avoidance, traditionalism)
    • DSM-IV (Diagnostic and Statisitcal Manual of Mental Disorders, fourth edition)
      • Externalizing: defiant disorder, conduct disorder, adult antisocial behavior
      • Academic effort: 8 items answered by twin mothers
      • Academic problems: 3 items answered by twin mothers
  • Family background
    • Hollingsworth scale on occupational status
    • Four year college degrees of parents

Estimation results

IGE

IGE
  • IGE = .13 is small relative to previous studies (.2 ~ .3)
    • Attenuation (measurement errors in family incomes)
    • European population has lower IGEs, Sweden = .125
    • Reduced by -482% when PGS and mediators are added
  • Effects of PGS
    • = .078 in (2)
    • = 0 in (3) when mediators of PGS are added

recursive SEM

recursive SEM

Education years↑ with

  • Cognitive ability, noncognitive ability
    • PGS of child, parents are not related when parental education, family income are added
  • Parental income, education
    • Impacts are twice as large for parental education

Cognitive ability↑ with

  • PGS

Noncognitive ability↑ with

  • PGS, smaller magnitude

full SEM

full SEM

Dizygotic (fraternal) + monozygotic (identical) twins

Education years↑ with

  • PGS of child and mother
    • PGS of father is unrelated
  • Parental income, education
    • Magnitude is larger for parental education

Parental education years↑ with

  • PGS of parents
    • Magnitude is larger for father PGS

Family income↑ with

  • PGS of parents
    • Magnitude is larger for father PGS

  • PGS of parents are not related to outcomes once IQ, soft skills index, parental education and family income are added
  • Most parental impacts on education are through incomes and parental education
  • No other indirect pathway remain from parental PGS, after parental education and incomes are addressed
    • No passive rGE other than parental education and incomes (genetic nurture) is found

感想

  • 咀嚼できていません
  • 人的資本投資を内生化したsteady state genetic transmission mappingの存在を証明
  • それと実証部分の結びつきが弱い(理解できていない)
  • 親の教育と所得を考慮すると、遺伝情報は子の教育に影響しないことを実証的に示したのは斬新
  • SEMはidentification assumptionが不明(recursiveモデル以外)、IVを使っていない?
  • Multiple testing気にせず

References

Barth, Daniel, Nicholas W. Papageorge, and Kevin Thom. 2020. “Genetic Endowments and Wealth Inequality.” Journal of Political Economy 128 (4): 1474–1522. https://doi.org/10.1086/705415.
Becker, Gary S., and Nigel Tomes. 1979. “An Equilibrium Theory of the Distribution of Income and Intergenerational Mobility.” Journal of Political Economy 87 (6): 1153–89. https://doi.org/10.1086/260831.
Goldberger, Arthur S. 1989. “Economic and Mechanical Models of Intergenerational Transmission.” The American Economic Review 79 (3): 504–13. http://www.jstor.org/stable/1806859.
Lee, James J, Robbee Wedow, Aysu Okbay, Edward Kong, Omeed Maghzian, Meghan Zacher, Tuan Anh Nguyen-Viet, et al. 2018. “Gene Discovery and Polygenic Prediction from a Genome-Wide Association Study of Educational Attainment in 1.1 Million Individuals.” Nature Genetics 50 (8): 1112–21.
Okbay, Aysu, Jonathan P Beauchamp, Mark Alan Fontana, James J Lee, Tune H Pers, Cornelius A Rietveld, Patrick Turley, et al. 2016. “Genome-Wide Association Study Identifies 74 Loci Associated with Educational Attainment.” Nature 533 (7604): 539.